UNNT: A novel Utility for comparing Neural Net and Tree-based models

The use of deep learning (DL) is steadily gaining traction in scientific challenges such as cancer research. Advances in enhanced data generation, machine learning algorithms, and compute infrastructure have led to an acceleration in the use of deep learning in various domains of cancer research such as drug response problems. In our study, we explored tree-based models to improve the accuracy of a single drug response model and demonstrate that tree-based models such as XGBoost (eXtreme Gradient Boosting) have advantages over deep learning models, such as a convolutional neural network (CNN), for single drug response problems. However, comparing models is not a trivial task. To make training and comparing CNNs and XGBoost more accessible to users, we developed an open-source library called UNNT (A novel Utility for comparing Neural Net and Tree-based models). The case studies, in this manuscript, focus on cancer drug response datasets however the application can be used on datasets from other domains, such as chemistry.

The comment we made regarding the applicability of the methods to chemistry was to indicate that, more than the field of study, the format of the data is what matters for the models.In fact, our datasets also contain chemistry data.The drug descriptor dataset we use for training consists of the chemical features of drugs in the dataset.Each drug is made up of a series of descriptors and they become features in our model.In addition, the gene expression dataset encompasses genomic information regarding each of the cell lines.
When it comes to the application of this method to classification, the nature of the problem, where we predict the AUC value, made us focus on regression.This can easily be converted to a classification problem if the problem allows us to convert the predictor variable to categorical values.In our case, we didn't have that option.
Reviewer #2: Dear Authors, The manuscript "UNNT: A novel Utility for comparing Neural Net and Tree-based models" presents a comparison between a decision tree-oriented method and deep learning networks in the medical field.The topic is interesting given the profusion of deep learning techniques in scientific works in different areas of application.
Regarding the article, check the punctuation, especially the use of commas.When an acronym is inserted for the first time in the text, its meaning should be presented.Pay attention to long paragraphs as they tend to make reading confusing.I have marked in the comments of the digital file the passages where the writing should be improved.Please revise and delete repetitive information from the text.
These comments were helpful and the changes we made to address them are in red text.The changes made in response to the comments below are also in red.
I would like your attention to the following recommendations: 1) Lines 59 to 81: cite the bibliographical references that provide support for the information presented.

See references 7 and 8.
2) Line 102: "UNNT splits that data into training, validation, and testing sets" present the percentage for training, validation and testing.Don't forget to detail the method used to draw the samples.
Addressed around line 110.
3) Line 105: "can be set, for both CNN and XGBoost models" .At this point, please note that the configuration of the convolutional network is much more complex than the Random Forest or CARTO methods; Made a note mentioning that in line 105.4) Please detail which type of decision tree associated with the XGBoost method was used in the study.

XGBoost used the Gradient Boosted decision tree which is part of a large class of Classification and Regression Tree (CART) method. Gave more details on the difference with the decision tree, especially during splitting, around lines 53-59.
5) Is it possible to run the solution developed in the Cloud?It would also be interesting to point out that a very specific hardware resource was used, which may be beyond the reach of many researchers.
Yes, the instructions provided with the GitHub repository should be sufficient to run UNNT in the cloud as long as Anaconda is installed by the user which is fairly trivial on most systems.The hardware used in this study can easily be requisitioned on a cloud platform to replicate the same environment as the systems we conducted this study on.
One instance of a unique environment we tested our code on but did not include in the study is a testbed Ookami system at Stony Brook university which contains the ARM-based A64FX processor.It's an accelerator that behaves like the CPU from a software perspective and often requires less code changes than porting to NVIDIA GPUs.This processor is used in the Fugaku system in Japan and was until recently leading the TOP500 list.But we used this system purely for testing and its not a pre 6) Line 135: add as supplementary information at the end of the article.
Thank you, we missed this detail in the submission guidelines.Now in the supplementary materials section.8) Results: it would be interesting to also present the overall accuracy and precision values.To diversify the presentation of the results, it would be interesting to present graphs, especially the AUC.
We wanted to show accuracy, precision and recall values for the same reasons you specified where these metrics can be graphically represented to readers.However, we didn't see how we could generate those metrics without turning this into a classification model from regression.9) Still on the results obtained, it would be interesting to compare them with the results obtained by the authors cited in the references.

7)
Cite in the text the flowchart of the steps implemented, which is presented at the end of the article.Mentioned in Design and Implementation section to reference the image in the supplementary materials section.Note: Upon request from the journal Figure1, referenced in the manuscript, was removed from the supplementary materials document and only uploaded individually.